Data analysis of EAC data

We look into the 3 year data provided by EAC conserning the load consumption/generation.

Assumptions made:

  • The total supplied in the transmission generation excel file is the total conventional generation
  • The positive value of the TCC means exporting power.
  • The 'TSOC ESTIMATED DISTR. PV GENERATED' is the total fof whole Cyprus.
  • The total load can be estimated by: $$Load=TOTAL Conventional + PV + WIND + EXPORT$$

Initialise and read data

In [1]:
from datetime import timedelta
import pandas as pd
import matplotlib.pyplot as plt
import cufflinks as cf
import chart_studio.plotly as py
import plotly.offline
cf.go_offline()
cf.set_config_file(offline=False, world_readable=True)
In [2]:
exc_file = 'Transmission Generation.xlsx'
years = ['032018','042018','032019','042019','032020','042020']
names = ['WIND','TOT','SYNCAP','TCC', 'TCCEX']
cols = 'C,G,I,K,M,O'
In [3]:
generation = pd.read_excel(exc_file, sheet_name=years, usecols = cols, skiprows = 4,  index_col = 0, names = names)
In [4]:
df18=pd.concat([pd.DataFrame.from_dict(generation['032018']),pd.DataFrame.from_dict(generation['042018'])])
df19=pd.concat([pd.DataFrame.from_dict(generation['032019']),pd.DataFrame.from_dict(generation['042019'])])
df20=pd.concat([pd.DataFrame.from_dict(generation['032020']),pd.DataFrame.from_dict(generation['042020'])])
In [5]:
start_date = '2020-03-01'
end_date = '2020-04-09'
df20['date'] = pd.to_datetime(df20.index)
mask = (df20['date'] > start_date) & (df20['date'] <= end_date)
df20 = df20.loc[mask]
del df20['date']
In [6]:
#print(df20)
In [7]:
exc_file = 'PVs Mar.xlsx'
years = ['2018','2019','2020']
names = ['PV']
cols = 'C,O'
pv_mar = pd.read_excel(exc_file, sheet_name=years, usecols = cols, skiprows = 4,  index_col = 0, names = names)
In [8]:
exc_file = 'PVs Apr.xlsx'
years = ['2018','2019','2020']
names = ['PV']
cols = 'C,O'
pv_apr = pd.read_excel(exc_file, sheet_name=years, usecols = cols, skiprows = 4,  index_col = 0, names = names)
In [9]:
df18pv=pd.concat([pd.DataFrame.from_dict(pv_mar['2018']),pd.DataFrame.from_dict(pv_apr['2018'])])
df19pv=pd.concat([pd.DataFrame.from_dict(pv_mar['2019']),pd.DataFrame.from_dict(pv_apr['2019'])])
df20pv=pd.concat([pd.DataFrame.from_dict(pv_mar['2020']),pd.DataFrame.from_dict(pv_apr['2020'])])
In [10]:
df20pv['date'] = pd.to_datetime(df20pv.index)
mask = (df20pv['date'] > start_date) & (df20pv['date'] <= end_date)
df20pv = df20pv.loc[mask]
del df20pv['date']
In [11]:
#df18pv = df18pv.between_time('05:00', '20:00')
#df19pv = df19pv.between_time('05:00', '20:00')
#df20pv = df20pv.between_time('05:00', '20:00')
In [12]:
#print(df20pv)
In [13]:
df18t = df18.merge(df18pv, left_index=True, right_index=True)
df19t = df19.merge(df19pv, left_index=True, right_index=True)
df20t = df20.merge(df20pv, left_index=True, right_index=True)
In [14]:
#print(df18t)
In [15]:
df18t["LOAD18"] = df18t["TOT"] + df18t["WIND"] + df18t["PV"] - df18t["TCCEX"]
df19t["LOAD19"] = df19t["TOT"] + df19t["WIND"] + df19t["PV"] - df19t["TCCEX"]
df20t["LOAD20"] = df20t["TOT"] + df20t["WIND"] + df20t["PV"] - df20t["TCCEX"]
In [16]:
#print(df20t)

Look into each year

We plot the data of each year individually.

In [17]:
df18t.iplot(kind='lines',xTitle='Dates',yTitle='MW',title='2018')
In [18]:
df19t.iplot(kind='lines',xTitle='Dates',yTitle='MW',title='2019')
In [19]:
df20t.iplot(kind='lines',xTitle='Dates',yTitle='MW',title='2020')

Comparison

We first shift the three timeseries to make sure they all coincide with the weekdays. In 2020, 1st of March was Sunday, in 2019 it was Friday and in 2018 was Thursday. This shift makes so that we compare the first Sunday of March 2020 with the first Sunday of March 2019 and 2018. Same for all days.

In [20]:
df18t["X_DATE"] = pd.to_datetime(df18t.index + timedelta(days=2*365-2))
df18t.set_index('X_DATE', inplace=True)
df19t["X_DATE"] = pd.to_datetime(df19t.index + timedelta(days=365-1))
df19t.set_index('X_DATE', inplace=True)
df20t["X_DATE"] = pd.to_datetime(df20t.index)
df20t.set_index('X_DATE', inplace=True)
In [21]:
#print(df18t,df19t,df20t)

We create a joint table with the overlaping LOAD data. Only the common dates are show. Since we shifted 2018 by 3 days to coincide the start date with Sunday, the plot shows only until March 28.

In [22]:
alldf = df18t[['LOAD18']]
alldf = alldf.merge(df19t[['LOAD19']], left_index=True, right_index=True)
alldf = alldf.merge(df20t[['LOAD20']], left_index=True, right_index=True)
In [23]:
#print(alldf)
In [24]:
alldf.iplot(kind='lines',xTitle='Dates',yTitle='MW',title='15-minute intervals')
In [25]:
alldf.resample('D').mean().iplot(kind='lines',xTitle='Dates',yTitle='MW',title='Daily average')
In [26]:
alldf.resample('D').max().iplot(kind='lines',xTitle='Dates',yTitle='MW',title='Daily max')
In [27]:
alldf.resample('D').min().iplot(kind='lines',xTitle='Dates',yTitle='MW',title='Daily min')
In [28]:
alldf.resample('W').mean().iplot(kind='lines',xTitle='Dates',yTitle='MW',title='Weekly average')
In [29]:
alldf.between_time('08:00', '18:00').resample('D').mean().iplot(kind='lines',xTitle='Dates',yTitle='MW',title='Daily average work hours only')
In [30]:
alldfWmean = alldf.between_time('08:00', '18:00').resample('W').mean()
alldfWmean.iplot(kind='lines',xTitle='Dates',yTitle='MW',title='Weekly average work hours only')
In [31]:
alldfWmean['perc'] = (alldfWmean['LOAD20'] / alldfWmean['LOAD19'])*100.0
In [32]:
alldfWmean.iplot(kind='lines',xTitle='Dates',yTitle='%',y='perc',title='Percentage change from 2019 to 2020')
In [34]:
alldfWmean.diff(periods=1).iplot(kind='bar',xTitle='Dates',yTitle='MW',y='LOAD20',title='2020 Weekly change')
In [ ]: